Harp-DAAL: A High Performance Data-Intensive Machine Learning Framework
نویسندگان
چکیده
Nowadays, many data analytics and machine learning problems contain millions or billions of training data and parameter data, it is obvious that the Distributed Processing mode is the only choice for many applications. Within DAAL's framework, the communication layer of the Distributed Processing mode is left to the users, which could be Hadoop, Spark, MPI, or any of the user-defined middleware. The goal of our project is thus to integrate Harp (a Hadoop plugin) into the Distributed Processing mode of DAAL. Harp has the following advantages:
منابع مشابه
Development of Harp-DAAL Interface
Nowadays, many data analytics and machine learning problems contain millions or billions of training data and parameter data, it is obvious that the Distributed Processing mode is the only choice for many applications. Within DAAL's framework, the communication layer of the Distributed Processing mode is left to the users, which could be Hadoop, Spark, MPI, or any of the user-defined middleware...
متن کاملHarp: a Machine Learning Framework on Top of the Collective Communication Layer for the Big Data Software Stack
متن کامل
Image Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملBioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computationand data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017